-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[docs] [template] [data] Refactor current LLM Batch inference template #59897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[docs] [template] [data] Refactor current LLM Batch inference template #59897
Conversation
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
…full 2M Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
…of batch size, concurrency, more refs to docs links, refactor quantization and model parallelism section for more readability, add image validation, mention anyscale runtime, pin datasets version Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
|
@nrghosh |
doc/source/data/examples/llm_batch_inference_text/content/README.ipynb
Outdated
Show resolved
Hide resolved
doc/source/data/examples/llm_batch_inference_vision/content/README.ipynb
Outdated
Show resolved
Hide resolved
Signed-off-by: Aydin Abiar <[email protected]>
doc/source/data/examples/llm_batch_inference_vision/content/batch_inference_vision_scaled.py
Outdated
Show resolved
Hide resolved
|
/gemini review |
nrghosh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @Aydin-ab - see also cursor/gemini comments when it comes to code. As long as you are able to run them successfully, should be free of serious bugs now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request introduces new LLM batch inference examples for both text and vision data, along with their corresponding CI configurations and helper scripts. The changes effectively split the existing template into two distinct, independently readable content pieces, which is a good refactoring step. The new examples demonstrate how to use Ray Data LLM APIs for batch inference, including data preparation, processor configuration, and scaling considerations. The addition of CI scripts ensures these examples remain functional.
However, there are a few areas that could be improved for robustness and clarity:
- The
nb2py.pyscripts use specific string matching to modify dataset limits for CI. This approach is brittle and could break if the exact string in the notebook changes. - Some comments in the Jupyter notebooks are slightly misleading regarding dataset size limits.
- The standalone Python scripts contain hardcoded configuration values that would ideally be configurable for real-world use cases.
doc/source/data/examples/llm_batch_inference_text/content/batch_inference_text.py
Outdated
Show resolved
Hide resolved
doc/source/data/examples/llm_batch_inference_vision/content/README.ipynb
Show resolved
Hide resolved
doc/source/data/examples/llm_batch_inference_vision/content/batch_inference_vision.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
doc/source/data/examples/llm_batch_inference_text/content/batch_inference_text.py
Show resolved
Hide resolved
doc/source/data/examples/llm_batch_inference_vision/content/batch_inference_vision.py
Show resolved
Hide resolved
doc/source/data/examples/llm_batch_inference_vision/content/batch_inference_vision_scaled.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
doc/source/data/examples/llm_batch_inference_text/content/batch_inference_text.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
doc/source/data/examples/llm_batch_inference_vision/content/batch_inference_vision.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
nrghosh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! minor comments but overall LGTM
- datasets version in vision template (4.4.2) shows diff from job.yaml (4.4.1)
- imports at top (not inline - style) for python files
- image mode handling potential issue with
batch_inference_vision.py and batch_inference_vision_scaled.py
when you open images with PIL, there can be issues with modes - something like
image = Image.open(BytesIO(image))
if image.mode != 'RGB':
image = image.convert('RGB') could improve robustness
- partition counts are hardcoded (64/128/256) but explanation says "2-4x the worker (GPU) count" - so with concurrency=4 and 128 partitions, that's 32x the gpu count
overall great - extensive use of the apis and clear/helpful distinction between batch/online inference and well written scaling guidance + bits on model paralellism
doc/source/data/examples/llm_batch_inference_text/content/batch_inference_text_scaled.py
Outdated
Show resolved
Hide resolved
doc/source/data/examples/llm_batch_inference_text/content/batch_inference_text.py
Outdated
Show resolved
Hide resolved
doc/source/data/examples/llm_batch_inference_text/content/batch_inference_text_scaled.py
Outdated
Show resolved
Hide resolved
doc/source/data/examples/llm_batch_inference_text/content/batch_inference_text.py
Outdated
Show resolved
Hide resolved
doc/source/data/examples/llm_batch_inference_text/content/batch_inference_text.py
Outdated
Show resolved
Hide resolved
doc/source/data/examples/llm_batch_inference_text/content/batch_inference_text_scaled.py
Outdated
Show resolved
Hide resolved
doc/source/data/examples/llm_batch_inference_vision/content/batch_inference_vision.py
Outdated
Show resolved
Hide resolved
nrghosh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approved with comments
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
follow up of this (closed for inactivity over the holidays):
#58571
There are a lot of files changed but the main technical content to review are the README.ipynb
For context, the goal is to refactor this current template
https://console.anyscale.com/template-preview/llm_batch_inference
And split it into 2: one on text data, and the other on vision data
Both are very similar, but should be read independently as two distinct content 👍